Skip to content

Conversation

ReeceStevens
Copy link
Contributor

Description

Related issues: I haven't opened an issue for this yet, nor did I see a related issue already open.

Hello pydicom team! Thanks for your excellent work on this library. I ran across an issue in production for which I wanted to push an upstream fix.

During header deidentification, deeply nested tag values (i.e. ReferencedSeriesSequence > 0 > SeriesInstanceUID) were being processed and saved back to the top-level tag (i.e, just the top-level SeriesInstanceUID). This was causing some very confusing and surprising behavior with UIDs not being converted to the expected values.

I added a minimal reproducible test case (1f9cb7e) which fails pre-patch. Once you apply commit 8b646e5, the test passes.

Please let me know if you have any questions or concerns about the patch! Thanks!

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • My code follows the style guidelines of this project

Open questions

None

This test exercises the corner case where a deeply-nested
SeriesInstanceUID attribute can be incorrectly saved back to the
top-level SeriesInstanceUID attribute, overwriting its value.
In cases where tags are deeply nested within sequences, the output of
deid functions were incorrectly being assigned to the top-level
tags with matching names. For example, a nested SeriesInstanceUID within
a ReferencedSeriesSequence was overwriting the top-level
SeriesInstanceUID value.

This was resolved by traversing the tree of tags and setting the nested
element, rather than the top-level element.
@vsoch
Copy link
Member

vsoch commented Apr 11, 2025

@ReeceStevens for black we have it pinned to black-23.3.0 - if you can pip install that version in an environment and run on the code, it should fix the failed test.

@ReeceStevens
Copy link
Contributor Author

Thanks @vsoch! Updated and pushed.

Comment on lines 472 to 473
else:
return int(tag_string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this a bit - if the first statement returns, the else isn't needed.

Suggested change
else:
return int(tag_string)
return int(tag_string)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

if tag_string.startswith("("):
result = parentheses_hex_tag_format.match(tag_string)
return int(f"0x{result.group(1)}{result.group(2)}", base=16)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here (per my comment below). If we return in the first statement, then the else here isn't needed, and you can unindent all of the following logic and make it easier for the developer (us!) to read.

Comment on lines 478 to 481
# A string representation of the full tag path, including parent sequences
full_tag_path = field
if hasattr(field, "uid"):
full_tag_path = field.uid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this little section doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment here for more clarity. This handles the edge case that field can be a string or an object with a uid property containing the tag path.

else:
self.dicom.add(element)
dataset_cursor = self.dicom
*parent_tags, last_tag = full_tag_path.split("__")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here about how this data is unwrapped, and why the last_tag is special/different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added! Let me know if you'd like me to expand on it more.

@@ -0,0 +1,104 @@
import contextlib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job to add a full test! 👏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! :)

@ReeceStevens
Copy link
Contributor Author

@vsoch I believe I've addressed all your comments in 9793a48-- please let me know if anything else remains unresolved.

@ReeceStevens ReeceStevens requested a review from vsoch April 14, 2025 09:40
Copy link
Member

@vsoch vsoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready for merge! The last final tweaks:

  • please bump the version in version.py
  • add a corresponding note in the CHANGELOG.md

@ReeceStevens ReeceStevens requested a review from vsoch April 15, 2025 09:28
@ReeceStevens
Copy link
Contributor Author

@vsoch ready!

@vsoch vsoch merged commit 88f0798 into pydicom:master Apr 15, 2025
3 checks passed
@ReeceStevens
Copy link
Contributor Author

@vsoch thanks for helping shepherd this through review! Do you know if you'll be able to publish the patch version to pypi some time soon? Wasn't sure if there was a regular release cadence you followed for this.

@vsoch
Copy link
Member

vsoch commented Apr 17, 2025

Thank you for the reminder!

https://pypi.org/project/deid/0.4.2/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants